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Abstract —This work studies the prohlem of content-hased 
image retrieval, specifically, texture retrieval. Our approach em¬ 
ploys a recently developed method, the so-called Scattering trans¬ 
form, for the process of feature extraction in texture retrieval. It 
shares a distinctive property of providing a robust representation, 
which is stable with respect to spatial deformations. Recent 
work has demonstrated its capability for texture classification, 
and hence as a promising candidate for the problem of texture 
retrieval. Moreover, we adopt a common approach of measuring 
the similarity of textures by comparing the subband histograms 
of a filterbank transform. To this end we derive a similarity 
measure based on the popniar Bhattacharyya Kernel. Despite 
the popularity of describing histograms using parametrized 
probability density fnnctions, such as the Generalized Ganssian 
Distribntion, it is nnfortunately not applicable for describing 
most of the Scattering transform subbands, dne to the complex 
modulus performed on each one of them. In this work, we propose 
to use the Weibull distribution to model the Scattering snbbands 
of descendant layers. Onr nnmerical experiments demonstrated 
the effectiveness of the proposed approach, in comparison with 
several state of the arts. 

Index Terms —Content-based image retrieval, feature extrac¬ 
tion, Bhattacharyya Kernel, Scattering transform, similarity 
measure, Weibull distribution. 


I. Introduction 

A. Overview 

C ONTENT-BASED image retrieval (CBIR) is a special 
case of image classification. It can be viewed as the 
process of assigning a query image to a set of image classes, 
where each class represents a database image. Content-based 
hereby refers to the mode of formulating the search query. As 
opposed to metadata-based image search, which relies on pre¬ 
labeling the images beforehand, a CBIR system retrieves the 
n best matches within the database with respect to the visual 
similarity to the query image. 

In CBIR, the concepts of feature extraction (EE) and 
similarity measure (SM) play crucial roles. EE converts an 
image into a feature vector of numerical values with the aim to 
produce a low-dimensional, yet sensible representation of the 
input image in the context of some particular application. An 
SM assigns a numerical value to a pair of two feature vectors. 
In this work, we assume that the SM is nonnegative and that a 
smaller SM value indicates a higher similarity and vice versa. 
A typical CBIR system is depicted in Eig. [T] The images 
to be extracted are processed via an EE algorithm and the 
extracted feature vectors (signatures) are stored in a database. 
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Fig. 1. A typical architecture of content-based image retrival. 


The query image is fed to the same EE algorithm and the 
output is compared with all the feature vectors in the database 
by means of the defined SM. As a result, the images with 
the lowest SM values are returned. Consequently, designing 
a CBIR system boils down to the construction of an EE/SM 
framework. 

Nowadays, CBIR is employed to deal with big databases 
and vast amounts of data. Thus, in additional to high retrieval 
efficiency, execution speed is of great concern. This requires 
sufficient dimensionality reduction within the EE and a com¬ 
putationally efficient SM. 


B. Related Work and Our Contribution 

Extracting sensible information from images has been a 
field of research in digital signal processing for roughly half a 
century now. Naturally, the choice of the EE method depends 
on the specific application. Eor instance, when comparing 
images of landscapes, pictures taken in a forest will have 
different color distributions from those taken in a desert, so an 
EE based on the color properties of the image is a reasonable 
approach. On the other hand, images of objects are strongly 
characterized by shapes, so in this case the EE could rely on 
extracted contours. This work focuses on CBIR for textures. 
The earliest approaches for texture EE, such as the gray- 
level co-occurrence matrices (GLCM) |fT| and the biologically 
motivated proposed Tamura features Q have been presented 
in the 1970s and are often based on numerical measures for 
how nearby pixels relate to each other. 

More recent approaches include incorporating spatial- 
frequency representations of the images ||^, such as the East 
Wavelet Transform (EWT) Q, Gabor frames Q or Curvelet 
frames 0 - A conventional approach is designing the features 
based on the energy of the respective transform coefficients, 
but it can be beneficial to look at their statistical properties 
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0^ 0- Thus, FE approaches based on the histograms of 
hlterbank transforms have become more prominent, cf. 0- 
03- Techniques of this kind are referred to as Subband 
Histogram methods in the work. 

Linear hlterbank transforms feed the input image to a bank 
of frequency-selective hlters, yielding a set of band-pass sig¬ 
nals as a representation. One problem of constructing FE based 
on such a decomposition is that the higher frequency subbands 
are prone to deformations in the spatial domain. Thereby, 
even slight deformations of an image will yield signihcantly 
different transform coefficients. However, simply neglecting 
the higher-frequency subbands is not desirable since they carry 
important information about such distinctive features as edges 
and corners. As a remedy, Mallat introduced the Scattering 
transform |T§, a non-linear signal representation involving 
hlterbanks and the modulus operation which transforms high- 
frequency components into low-pass representations. The Scat¬ 
tering transform appears to perform well in texture classihca- 
tion tasks GZl-GD- The key contribution of this work is to 
dehne a statistical model in order to design subband histogram 
FE for texture images based on their Scattering transform 
coefficients. Furthermore, we introduce a similarity measure 
based on the extracted features, with the property of being 
a Kernel and thus can be interpreted as an inner product of 
Textures transformed to some Hilbert space. 

The paper is organized as follows. Section provides 
necessary mathematical preliminaries of the problem of CBIR, 
subband histogram methods, and the basics of the Scattering 
transform. In Section III we develop a statistical framework 
of CBIR based on the Scattering transform. In Section |IV] 
several numerical experiments are presented to investigate the 
performance of the proposed approach in comparison with the 
state of the art methods. 


II. Mathematical Preliminaries 
A. Notations 

Unless stated otherwise, a signal denotes an element of the 
Lebesgue space The terms almost everywhere and 

almost all refer to conditions which hold for all where 

Af can be any null set. For the sake of simplicity, we refer to 
as a Hilbert space of functions, even though it would 
be more precise to call them equivalence classes of functions 
that are equal almost everywhere. Bold-faced lowercase letters 
X or Xi describe vectors, while regular lowercase letters like 
X or Xi describe scalar values. Depending on the context, 
uppercase letters stand either for scalar values or for matrices. 
For a complex value z, z denotes its conjugate, and 3?(2;), 'As{z) 
the real and imaginary part, respectively. Angle brackets (•, •) 
denote the scalar product of and || • |j the canonical 

norm induced by it. An asterisk denotes the convolution f * g 
of two signals /, g. 


B. Subband Histogram Methods 

The coefficients {aii,..., aijv} produced by a transform of 
the query image can be viewed as a set of realizations of a 
random experiment. Let pd describe the probability density 


function (PDF) of the transform coefficients of some database 
image. 

The likelihood of a set of random realizations {xi,x^} 
for a PDF p is dehned as 

N 

L{xi,...,xn\p) = Y[p{x^), (1) 

i=l 

and is a measure for the probability that {xi,..., a;jv} is 
subjected to p. That is to say, L{xi, ...,XM\pd) can be viewed 
as a measure of similarity between the query image and the 
database images. Since the natural logarithm In is a monotone 
function, this is also true for the log-likelihood Ini. 

Let A denote the set of the PDFs of a number of database 
images. The best match for a query {xi,..., x^r} can thus be 
determined via 


p*d = argmaxlnL(xi, ...,XN\pd) 

PdeA 

N ( 2 ) 

= arg max ^ In (x^). 

Pd&A 

Expression Q is called a Maximum Likelihood (ML) solution. 

Assume the query {xi,...,XAr} is subjected to a PDF 
Pq. Then, by the law of large numbers, the log-Likelihood 
lnL(xi,..., xjvjpd) can be approximated by the negative 
cross-entropy 

- H{pq,pd) = / pq{x)\npd{x)dx. (3) 

Jpd{d:)A0 


Assuming a generative model, the cross-entropy H{pq,pd) 
dehnes a similarity measure between the respective query and 
database image. 

One of the most well-studied transforms in image pro¬ 
cessing is the multiresolution decomposition produced by the 
FWT. In particular, it is known that the histograms of its band¬ 
pass subbands can be modeled by the Generalized Gaussian 
Distribution (GGD), c.f. | [20) , dehned as 


PGGD(a;|a,/3) 


P -(|x|/a)'3 

2ar(l//3) 


(4) 


where F denotes the Gamma-Function, 


m = 


X* “ dx, 


(5) 


and the parameter a G M+ determines the scale of the 
distribution while /? G K+ describes the shape. The cross¬ 
entropy between two GGDs ist given by 


-ff(PGGD(a;|ai,^i),PGGD(a:|Q!2,/32)) 



-f InF 



/aiY^ r((/32 + l)//3i) 
[a2j r(i/^i) 


( 6 ) 


Equation (|^ provides a parametrized SM for texture images. 
The GGD parameters can be estimated from each image using 
FE based on ML which together with the SM in (|^ dehnes 
a complete framework for CBIR. This approach was explored 
and discussed in 0> though it was equivalently formulated 
in terms of the Kullback-Leibler divergence (KLD) rather 
than the cross-entropy. It can be viewed as a blueprint for 
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Other subband histogram methods for which the motivation is 
twofold. First, the FWT is not the only filterbank transform 
for images and is not necessarily the best basis for texture FE. 
Furthermore, even though the cross-entropy can be rigorously 
motivated by ML optimization, it lacks any geometrical inter¬ 
pretation, which poses the question, whether there are more 
suitable SMs for parametrized probability models. 

C. The Scattering transform 

1) Definition: Let 6 G L^(R^) be a rotationally symmetric 
window function with low-pass characteristics. Let rf G \ 
{0} and J G Z be hxed and 7Z C SO(2) a hnite group 
of rotation matrices. With 'fi(x) = we define the 

wavelet as 

fi,,Rix) = 4-^fii2-mx), j G{J,J- 1 ,...}, RGTZ. (7) 


Let us denote by Vm the (ordered) set of all possible paths p 
of size 0 to M. We can then define the Finite-path WST as 

= {S,p,,ij,j,R[f;p]}p(^VM- ( 14 ) 

2) Properties: The output signals of the Linite-path WST 
are the elements of ([T^, also referred to as WST subbands in 
the following, which are all low-pass signals due to the hl- 
tering with fij. With increasing M, "Pm] captures 

more and more energy of the input signal / and the WST 
representation becomes more expressive as the maximum path 
length grows. Let the Scattering norm be defined as 

I|5'0,.0,j,k[/;Pm]|| 5 = ^ ||-S'0,v.,j.7?,[/;p]|P- (15) 

P&Vm 

As a consequence of the tight frame property (|^ of D, the 
infinite-path WST is unitary, i.e. 


Lurther, let us assume a low-pass and rotationally symmetric 
scaling function |j^ G and define 

cl)j{x)=^~'^(j){2~-’x), ( 8 ) 

such that for the respective Fourier transforms (j), 

J 

|^(2M|2+ ^ ^ |^(2^i?,^)|2 = l (9) 

j — — oo RgTZ 

holds, for almost all a; S Then the set 

= {0j(m - a;)}„gH2 

constitutes a Parseval-tight frame It will be assumed to 
span in the following. Note that for two signals f,g, 

the equation 

{f*g){u) = {f{x),g(u-x)) ( 11 ) 


holds. 

At the core of the Windowed Scattering transform (WST) 
fl^ is a dyadic wavelet decomposition j, i?] of the 

input signal / G with the complex modulus performed 

on the band-pass components, dehned as 


[/; j, 7?] 


IV'i.i?*/!, j <-7, 
<^j * /, j > J- 


( 12 ) 


The modulus operation | • | traverses some of the energy of 
the band-pass signals towards lower frequencies. Therefore, 
can be applied to the output signals IV'j.R * f\ again. 
This yields a tree structure like the one depicted in Lig.|^ Note 
that ( [T2| ) yields an inhnite (but countable) number of output 
signals. However, in practice we deal with bandlimited input 
signals /. Hence, there exists an integer Ji with J; < J, such 
that Eq. needs to be evaluated for j > Ji only, which 
corresponds to a tree with hnite breadth. 

Basically, the idea of the WST is to apply succes¬ 

sively to the input signal and only keep the low-pass signals, 
i.e. to neglect signals represented by the black nodes in Lig. 
1^ The WST along the path p = (O'l, 7?i),..., (j„, 77^)) of 
scaling factors and rotations is dehned as 


= fij* * I • • • * \ fiji,Ri */l • • ■ II- (13) 


||5'<^,j/>,j.7?,[/;7^o 


Il5 = 


lim ||5'0 .0 j,7?.[/;T’m]||s 
M—foo 

ll/ll- 


(16) 


In practice, maximum path depths of M = 3 are expected 
to capture all essential information p2) , | |23) . Actually recon¬ 
structing / from its WST involves a phase recovery problem. 
However, it is known p4) that digital realizations of wavelet 
representations similar to those employed in the WST are 
uniquely determined by their complex modulus. 

The WST is non-expansive i.e. for two signals /i, / 2 , 
it can be shown m that for any positive integer M, 

I|5'0,^,j,7?,[/i;’Pm] — Pm]||5 

= X/ ~ (17) 

PClVm 

<||/l-/ 2 f 


holds, implying that the WST is robust with respect to additive 
noise. 

Since WST is a representation consisting solely of low- 
pass signals, it is stable with respect to small spatial trans¬ 
lations and deformations. Specihcally, let / be a deformed 
or translated version of /. Under certain mild assumptions 
about the underlying wavelet frame, it has been proven that 
the error ||S' 0 ,v>.J, 7 ?.[/; Pm] - 'Pm]|| is bounded 

asymptotically, cf. GD- Note, that we are given some freedom 
of choice in the wavelet frame which allows for considerable 
Hexibility in terms of parameters such as frequency selectivity 
or directionality. 

Due to its stability and hexibility, the WST is a convenient 
signal representation and can be used as the foundation for the 
LE from images. 

3) Normalized WST: In order to reduce redundancy and 
to increase invariance to distortions, the Normalized WST 
(NWST) was introduced. Let p G Vm be the prede¬ 
cessor of p G Vm, i-e. p = ((ji, 7?i),..., (j^, 7?„)) implies 
P — ((jlj 7?l), ..., {jm — l, 7?m —l))- 

Let ip denote a very narrow-band lowpass blurring hlter. Lor 
the layers m > 1, the NWST is dehned as 


^ip,^,tp,J,Ti[f',P\ — 


\f\*V 

S 4 ,,,p,j,-R.lf-,p] 

. S’,#.,.,/,,j,-R,[/;p] 


if p G Pi, 
otherwise. 


(18) 
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(^^)<—|i^0,i/.,j.rej (^ — ^,i,,ip,.j,'i^ — ^,I,,,I,,.J,T^ — ^(i>,ip,j,T^ j.-rej — ^4,,ip,j,i^ (^ — ^,i,,-ii,,j,'R^ 



Fig. 2. WST tree produced by successive application of on the input signal /. Lowpass signals (The WST subbands) are depicted as white nodes. 

The complex moduli of band-pass signals are depicted as black nodes. 


In words, each subband of the WST is normalized by the 
respective parent subband, except for the subbands in the first 
layer which are normalized by the mean of the modulus of 
the input signal. In practice, a small constant is added to the 
denominator in order to avoid division by zero. 

III. Statistical Scattering CBIR 
A. Subband Modeling 

We propose to model the gray-value distributions of the 
different WST subbands with parametrized PDFs and describe 
the images in terms of their respective parameters to obtain a 
complete FE mechanism on top of the WST. 

The most distinctive features in textures are those of higher 
frequencies and are thus carried by the layers m > 1. These 
layers contain signals of the form 

fpJ * * I • • • * IV'ji.fti * /I • • ■ II, Pt^Po- 

Clearly, the modulus eliminates all negative values which is 
why a symmetric model such as the GGD is not appropriate 
anymore. The histograms of signals from descendant layers 
suggest a PDF that occupies only positive values and can 
describe a skew to the left, which makes the Weibull Dis¬ 
tribution (WD) a nearby choice. In fact, the WD was already 
successfully employed in the modeling of complex wavelet 
coefficients GD . Fig.|^exemplarily shows histograms of WST 
subbands of different textures with a fitted WD curve. Despite 
variations in skewness and spread, the WD fittings describe the 
actual histograms fairly well. The PDF of the WD is defined 
for a; > 0 as 

Pwu{x\X,k) = Xk ■ . (20) 
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Fig. 3. Histograms (blue) and respective ML Weibull fittings (red) of WST 
subbands from ayers m = 1 and m = 2 for different texture patches from 
the UIUC texture database 


for the layer m = 0, we neglect the impact of the low-pass 
filter (j)j. The approximately analytical band-pass filters 
produce signals with similar zero-mean distributions in their 
real and imaginary parts. Moreover, it is known that band¬ 
pass components of a natural image can be well modeled 
by a GGD. This is also a reasonable assumption for band¬ 
pass components of Scattering subbands since they resemble 
natural images under dim lifghting conditions. It is therefore 
natural to assume the GGD with the same parameters as a 
model for the distribution of the real and imaginary parts of 
the band-pass signals * I • • • * I'^ji.rti * /I' ’ ’ I involved 

in the WST. By neglecting statistical dependence between real 
and imaginary components, let us define a “complex GGD” 
with statistically independent real and imaginary parts as 


Analogous to the GGD, A S M+ is the scale parameter, 
whereas k G M+ determines the shape. 

The above argument is purely empirical. The following 
discussion aims to justify further the choice of the WD as 
a model for WST subbands ([T^ in the layers m > 1: Like 


PGGGD{z\a, P) = PGGD{^{z)\a, l3)pGGD{Xs{z)\a, P). (21) 

In order to determine the PDF for the modulus cc = |z| > 0, we 
need to fix | 2 :| and integrate pcGGD over the circles centered 
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at 0 in the complex plane; 

PCGGD,abs(x|a,/3) = / pcGGD(-2|a,/3)dz 

, , 2 . (22) 

\aT{l/(3)J Jo 

Evaluating the integral analytically is demanding, if not im¬ 
possible. However, it is known that the modulus of a complex 
variable with statistically independent and Gaussian distributed 
real and imaginary parts abides the Rayleigh distribution, 
which can be easily verified by substituting a = s/2a and 
/? = 2 into Eq. ( [22l i, i.e. 

PCGGD,abs(a;|'s/2cr, 2) = \ (23) 

The usage of the GGD for modeling EWT subbands was 
motivated by the need to model histograms which are more 
or less peaked in shape than it is the case for the Gaussian 
distribution. Eor this purpose, the additional shape parameter /3 
was introduced. This corresponds to introducing an additional 
shape parameter to the Rayleigh distribution. Altering the 
peakedness of the real and imaginary part corresponds roughly 
to altering the skewness of the respective modulus distribution. 
Erom Eq. ( |22| l, we can observe that pcGGD,abs(0|Q!, /3) = 0, 
so a good alternative to pcGGD.abs for modeling the WST 
subbands would be a two-parameter generalization of the 
Rayleigh distribution which is controllable in its skewness 
without changing its value at 0. Eor k > 1, these requirements 
are fulhlled by the WD. 

Even though it is sensible to model the WST with its 
Weibull coefficients, it is still questionable if this decision is 
justihable for the NWST It is certainly true for the hrst layer 
since it only involves an overall scaling. Unfortunately, this can 
not be assumed for the other descendant layers. Nevertheless, 
experiments show, that in practice this assumption still holds. 
However, for the most important ranges of k, the multiplicative 
inverses of WD distributed samples exhibit histograms which 
again can be well modeled by the WD as can be seen 
in Eigure Thus, the normalized coefficients in ( fTS) ! for 
m > 2 are products of values whicht are close to be Weibull 
distributed and statistically independent. Thus the WD will 
again dominate the subband histograms. 

We employ a numerical approach to estimate the Weibull 
parameters via minimizing the corresponding ML function. 
The derivation of the algorithm can be found in Appendix [A| 



Fig. 4. Histogram of Weibull distributed samples and their inverses 


2) Weibull Similarity: Our aim is to derive a simple 
approximation of for a pair of Weibull distributions 
PwD(a:|Ai, fci),pwD(a;|A 2 , fe). To this end, let us assume that 
ki « k 2 - This is a justihable assumption since the distributions 
are always compared for each subband individually. Let us 
further dehne 


ki k2 j , k -f A 2 

k = - and a = \ — - 

2 V 2 

This ultimately leads to 

-BC'(PwD(a:|Ai,fci),pwD(a;|A2,fc2)) 

poo 

= / s/Pwbiix\ki, \i) * Pwhiix\k2, X2)dx 
Jo 

= J\\^\l^s/kJk 2 / x’^-^e- -- dx 

Jo 

I - 

\\X'^s/hk 2 J dx 

= y XiX^y/kik 2 / —j^p^h\ix\k,X)dx 
\J AiA 2 'Jk]k2 


=4 


A^ -f A 2 ki k2 


(25) 


(26) 


Eor the sake of convenience, let us write the arithmetical and 
geometrical mean of two values yi,y 2 G K+ as 


B. Scattering Similarity 


1) The Bhattacharyya Kernel: Both the cross-entropy and 
the KLD can be derived for the WD in order to dehne an 
SM similar to the on discussed in Subsection |II-B| cf. | |^ . 
However, the derivation leads to very complex expressions. 

As an alternative, we propose an SM based on the Bhat¬ 
tacharyya coefficient. Eor a pair of PDEs p, q, it is dehned 
as ^00 

BC{p,q)= / s/p(x)q{x)dx.. (24) 


Expression ( |24l i is a popular choice for Kernels in the machine 
learning community p6) and as such imposes a Hilbert space 
structure on probability based features. 


/ia(yi,y 2 ) = and /ig(yi,?/ 2 ) = s/mn, (27) 

respectively. Erom the last equation of ( |26] l we de¬ 
hne our similarity measure for pairs of Weibull PDEs 

PvvD(a::|Ai,fci),pwD(a::|A 2 ,fc 2 ) as 


K{Xi, fci; A 2 , fe) 


Tg{X\,X^) ^ pg{ki,k2) 
Ba{X\,X^) Pa[ki,k2)' 


(28) 


In the derivation of K, we replace ki and k 2 by k each time 
it appears as the exponent of Ai and A 2 , respectively. Even 
though this step is not necessary to eveluate the integral in 
( |2^ , it ensures a straight-forward understanding of ( |28] l, with 
the hrst factor measuring the similarity of the scale and the 
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second factor measuring the similarity of the shape. Moreover, 
just like ( |24l l, the expression ( |28| l defines a Kernel for Weibull 
PDFs, as it is shown in the following proposition. 

Proposition 1. Expression ( |28[ ) is a Kernel for Weibull pa¬ 
rameters. 

Proof: Note that for k = ki = k 2 the approximation in 
(|26ll becomes an equality. I.e., 


BC{pwD{x\Xi,k),pwD{x\X 2 , k)) =K{Xi,k; X 2 , k) 

_ Pg{X\,X’f) (29) 

pa{X\ , A2) 

is a Kernel, since it is a Bhattacharyya coefficient of two PDFs. 
So is 

BC{pwD{x\ki, l),pwD(a;|A:2,1)) = (30) 

Pa{ki,k2) 

because both the shape and the scale parameter of the WD 
are constrained to be from the same domain K+. Therefore, 
Expression ( |28] l is the product of two Kernels and as such a 
Kernel itself. ■ 

The Kernel property of ( |28l l provides us with some ge¬ 

ometrical intuition, hence makes it a good choice as the 
fundamental building block of an SM. This includes the fact 
that it naturally induces a metric, which incorporates the notion 
of similarity as a geometric distance as is for instance the 
motivation behind multidimensional scaling Ez)- 

It also enables the (N)WST to be conveniently subjected 
to many machine learning techniques such as Support Vector 
Machines. 

3) Similarity for Multiple Subbands: Consider two PDFs 
p, q of two statistically independent random variables X, Y, 

P{x,y) =Pi{x)p 2 {y), q{x,y) = qi{x)q 2 {y) (31) 

It can be easily seen that 

BC{p,q) = BC{pi,qi)BC{p 2 ,q 2 )- (32) 

Carrying over this insight to the derivation ( |26l l yields, for a 
pair of sets of N independent WDs with the parameter vectors 


N 

K{X\k^-,X\k^) = Y[K{Xy,,kl,p,X2,^,k2,^). (33) 

i=l 

For a pair of WST transforms with N subbands, it is therefore 
straightforward to derive an SM. Since by our definition a low 
SM indicates a high similarity we apply the logarithm and 
reverse the sign which finally leads us to 

5Mscat(A\fc^;A",fc2) 

^ (34) 

= - ^ IniTjAl.i, kix, X2,^, k2,i). 

i=l 


C. Implementing the Framework 

The two previous sections provide the theoretical basis for 
building a complete CBIR based on WST Subband histograms. 
In order to extract the features, the WST (Section |II-C| l is 
performed on each image. Each subband of the layers m > 1 
is then subjected to ML in order to extract the WD parameters. 
If the number of subbands is N, this amounts to feature vectors 
of size 2N. In order to compare the query feature vector to the 
feature vectors in the database, the SM ( [34l l us employed. As 
most of these steps can be performed independently of each 
other, the whole system is well suited to be implemented in a 
parallel manner, for example in applications for Big Data. 


IV. Numerical Experiments 


A. Experimental Settings 

1) Implementation: All experiments were performed in a 
Matlab 2014a environment. The code reproducing the key 
results is available onlin^ The WST was performed by 
means of the ScatNe0 toolbox, version 0.2. The Weibull 
parameters were extracted via wblfit from the Statistical 
Toolbox. Since it is the most standard subband histogram 
technique, we also implemented the EWTh-GGDh-KLD method 
according to Q or comparison. Additionally, we implemented 
a method based the Dual-Tree Complex Wavelet Transform 
(DT-CWT), inspired by GD- The DT-CWT was performed 
by the DT-CWT Transform Pacl^ version 4.3. Similarly the 
subband histograms were modeled by the WD, but the KLD 
was replaced by the proposed SM in ( [34| ) for the sake of 
comparability. 

2) Choice of parameters: Along with the regular WST, all 
experiments were also conducted with its normalized version 
as defined in ( fTSl l. The directionality parameter, corresponding 
to the cardinality of the rotation group TZ, is fixed to be L = 4. 
All computations were performed with double precision and 
the subbands of the WST are critically sampled according 
to the Shannon-Nyquist sampling theorem. The bandwidth 
of the low-pass filter fj is chosen in a way such that it 
produces signals of size 16 x 16 = 256, in accordance with 
0. The WST is evaluated for the maximum path lengths 
M = 2 and M = 3. In theory, a linear increase in M 
leads to an exponential increase in the number of subbands. 
However, since the modulus operation hardly traverses any 
energy toward higher frequencies, they become negligible with 
increasing path length. Apart from the mentioned, default 
settings of ScatNet were used. In particular, the Morlet Wavelet 
was used as the band-pass filter ijj on each layer of the WST. 


The EWT relies on the D2 4-tap Daubechies wavelet |211 
as the underlying multiresolution representation. The DT-CWT 
was run with the options antonini and qshift_a. In both 
cases, the size of the smallest subband was 16 x 16, which 
corresponds to three levels of decomposition. 


'https://www.ldv.ei.tum.de/uploads/media/texture_retrieval_scattering_14.zip 
^http://www.di.ens.fr/data/software/scatnet/ 
^http://www-sigproc.eng.cam.ac.uk/Main/NGK 







7 



Fig. 5. Database D1 from left to right and top to bottom, with row and column in brackets: BarkO (1,1), Bark6 (1,2), BarkS (1,3), Bark9 (1,4), Brickl (1,5), 
Brick4 (1,6), BrickS (1,7), Buildings9 (1,8), FabricO (2,1), Fabric4 (2,2), Fabric? (2,3), Fabric9 (2,4), Fabricll (2,5), Fabricl4 (2,6), Fabricl5 (2,7), Fabricl? 
(2,8), Fabricl8 (3,1), Flowers5 (3,2), FoodO (3,3), Food5 (3,4), Food 8 (3,5), Grassl (3,6), Leaves8 (3,7), LeaveslO (3,8), Leavesll (4,1), Leavesl2 (4,2), 
Leavesl6 (4,3), MetalO (4,4), Metal2 (4,5), Misc2 (4,6), SandO (4,7), Stonel (4,8), Stone4 (5,1), TerrainlO (5,2), Tilel (5,3), Tile4 (5,4), Tile? (5,5), Water5 
(5,6), Woodl (5,7), and Wood2 (5,8). 


3) Data: The database D1 is the same as used in 0 and is 
based on the VisTex databas^ It consists of 40 texture images 
of the size 512 x 512, each divided into 16 non-overlapping 
128 X 128 patches. This amounts to 640 image samples of 
40 different textures depicted in Fig. The database D2 
was generated from images of the following subset of the 
UIUC texture databas^ Barkl, Bark2, Wood2, Wood3, Water, 
Marble, Floorl, Floor2, Pebbles, Wall, Brickl, Glassl, Glass2, 
Carpetl, Carpet2, Wallpaper, Fur, Knit, Curdoroy, Plaid. Two 
640 X 480 images from each class are used to create five 
overlapping 256 x 256 patches which are then scaled down 
to half the edge size. Hence, we get a database of 20 different 
texture classes each containing ten 128 x 128 patches. Fig. 
depicts one patch from each utilized image. All image patches 
are normalized to zero mean and unit energy, in order to 
avoid any bias caused by the overall lighting condition of each 
original texture image. The set of all patches generated from 
the same texture is considered a class. Its cardinality will be 
denoted by c in the following. Consequently, c = 16 for D1 
and c = 10 for D2. For each image patch, the c—1 most similar 
patches were retrieved. The retrieval rate for each patch is 
defined as the ratio of the number of retrieved patches from 
the same class to c — 1. The overall retrieval rate is the average 

“^httpi/Zvismod.media.mit.edu/vismod/imageryA^isionTexture/distribution.html 

^http://www-cvr.ai.uiuc.edu/ponce_grp/data/ 


FWT+GGD 

+KLD 

FWT+GFD 

+KLD 

WD-HMM 

Rotated Wavelets 

76.93% 

78.40% 

80.05% 

82.81% 


TABLE II 

Performance of state of the art methods on Database D1 


of the retrieval rates for all the images in the database. 

B. Results 

1) Overview: The retrieval rates are summarized in Table 

While WST-bWD-bS'Mscat produces a similar result as DT- 
CWTH-WD-bS'Mscat on Database Dl, NWST-bWD-bS'Mscat 
is able to outperform all of the competing frameworks by 
4.72% for M = 2 and 5.12% for M = 3. Database Dl is 
widely used as a benchmark for CBIR retrieval. In order to 
provide a sense for the state of the art. Table |II] summarizes 
the results from recent publications on comparable approaches: 
DWT + Generalized Gamma Distribution (GFD) | fT3l , Wavelet 
Domain Hidden Markov Models (WD-HMM) flO) and Ro¬ 
tated Complex Wavelets | |28) . Also, the original result for 
FWTh-GGDh-KLD from 0was included, since we were not 
able to reproduce it. To the authors’ best knowledge, the best 
published result for this experiment so far was produced by 
Rotated Complex Wavelets with a retrieval rate of 82.81% 

















Fig. 6. Database D2 (patches) from left to right and top to bottom: Barkl, Bark2, Wood2, Wood3, Water, Marble, Floorl, Floor2, Pebbles, Wall, Brickl, 
Glass 1, Glass2, Carpetl, Carpet2, Wallpaper, Fur, Knit, Curdoroy, Plaid. 



FWT+GGD 

+KLD 

DT-CWT+WD 

+<S’Mscat 

WST+WD,M=2 

+<S'M'scat 

WST+WD,M = 3 
+5Mscat 

NWST+WD,M = 2 
+SMscat 

NWST+WD,M = 3 
+5Mscat 

Database Dl 

75.50% 

78.18% 

78.93% 

78.14% 

84.90% 

85.30% 

Database D2 

52.39% 

59.61% 

66.50% 

63.78% 

66.94% 

65.17% 


TABLE I 

Performance of WST + WD and NWST + WD in comparison with FWT + GGD and DT-CWT + GGD on databases 1 and 2 


which is still outperformed by 2.09% by NWST+WD+^Mscat 
with M = 2 and 2.49% with M = 3. 

Database D2 is considerably smaller than Dl, but involves 
more variation within the classes, for instance, in terms 
of camera angle and deformation. Again, the WST greatly 
improves the retrieval performance. However, this time the 
regular WST does not fall behind the NWST. Also, increasing 
the maximum path length up to M = 3 harms the perfor¬ 
mance. However, we can conclude that NWST-nWD-nS'Mscat 
performs comparably well and produces the best results in 
both our test settings. 

2) Impact of Directionality: In general, natural textures 
contain strongly directional features. Hence, it is often ben¬ 
eficial to incorporate a frame of higher directionality as it 
is done for the Rotated Wavelets approach, for example. 
The DT-CWTH-WD-HiSMscat method which mostly draws its 
benefits from increased directional selectivity in comparison 
with standard FWT based techniques is consistently and 
significantly outperformed by NWSTH-WD-nS'Mscat. despite 
the same directionality properties. That is to say, there are 



Fig. 8. Retrieval Rate 


reasons to believe that the success of the WST based methods 
is not exclusively due to increased directionality. 
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Fig. 7. Retrieval rates for each texture class of D1 individually. From left to right, with index in brackets: BaikO (1), Bark6 (2), BarkS (3), Bark9 (4), Brickl 
(5), Brick4 (6), Brick5 (7), Buildings9 (8), FabricO (9), Fabric4 (10), Fabric7 (11), Fabric9 (12), Fabricll (13), Fabricl4 (14), FabriclS (15), Fabricl7 (16), 
FabriclS (17), FlowersS (18), FoodO (19), Food5 (20), Food8 (21), Grassl (22), Leaves8 (23), LeaveslO (24), Leavesll (25), Leavesl2 (26), Leavesl6 (27), 
MetalO (28), Metal2 (29), Misc2 (30), SandO (31), Stonel (32), Stone4 (33), TerrainlO (34), Tilel (35), Tile4 (36), Tile7 (37), Water5 (38), Woodl (39), and 
Wood2 (40). 


3) Impact of Textural Structure: To farther analyze the 
capabilities of the involved methods, consider Fig. It shows 
the retrieval rates for each class in Database D1 produced by 
DT-CWT as well as by WST and NWST with M = 2. DT- 
CWT + GGD performed well in comparison to the WST based 
techniques for the classes Brick5, Buildings9, MetalO and 
Wood2 and badly for Bark9, Fabric4, LeaveslO and Stonel. In 
general, it appears that DT-CWT is most suitable for flat and 
(piecewise) smooth surfaces, while the two methods involving 
Scattering seem to be preferable for surfaces that are rather 
uneven, be it because they are rough or because they are bent 
or dented. This correlates with the premise that Scattering 
coefficients are less sensitive to spatial deformations, since 
on the one hand, images of rough surfaces carry significant 
fractions of high-frequency components and on the other hand 
spatial deformation in the image is often caused by locally 
bending or denting the depicted material. More specifically, 
NWST performed comparably well for the classes BarkO, 
Bark6, FlowersS, TerrainlO and regular WST for the classes 
Barks, Fabricll, Grassl, Stone4. This suggests that WST 
works better on fine structures while NWST yields better 
results for coarse structures. 

This assumption is corroborated by the following experi¬ 
ment. Fig. depicts the retrieval rates for the Database D2 
after blurring it with a Gaussian Alter of different band- 
widths. Unsurprisingly, the retrieval performance declines with 
stronger blurring for all methods. However, it is evident that 
the retrieval rate for WST declines significantly faster than 
the other two. This can be interpreted as a consequence of 
the distortion invariance of normalized Scattering coefficients. 
More importantly, this confirms that the regular WST method 
relies on fine features since they are most heavily affected by 
blurring. On the other hand, images of coarse patterns and flat 


and piecewise smooth surfaces are less affected by blurring 
such that the other two methods appear to be relatively robust. 


V. Conclusion and Outlook 

In this work, we propose a subband histogram FE method 
based on the statistics of the WST of a texture image. Our 
derivation and analysis demonstrate how to extract statistical 
features from a WST representation of a texture image by 
means of matching a Weibull model via ML and how to 
measure the similarity of the respective feature vectors via the 
KLD. The proposed techniques come in handy when it comes 
to reducing the enormous degree of redundancy introduced 
by the WST since they represent each subband by as much 
as two real numbered parameters. The experiments show that 
the proposed method can outperform comparable algorithms 
based on Alter bank transforms in terms of retrieval accuracy 
when designed as a CBIR system. Regular WST coefficients 
seem to work better on fine structures and Normalized WST 
coefficients on coarse structures. 

Since approaches for rotation invariant Scattering represen¬ 
tations are already available 113, |Tg, it appears feasible 
to extend the presented ideas towards a rotation invariant 
CBIR framework which would allow for more general problem 
settings. 

The summation in p4| ) was partly motivated by assuming 
statistical independence of the subbands. However, neither 
does statistical independence hold for the WST subbands, nor 
for the subbands of other fllterbank transforms, in practice. 
Accounting for statistical dependence in the model could lead 
to significant improvements in efficiency 
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Appendix 


A. Estimation of Weibull parameters 

In this section, we employ a numerical optimization ap¬ 
proach to estimate the Weibull parameters. A more detailed 
derivation can be found in | |29) . The log-likelihood amounts 
to 


fwD{k,\)= InLwoixi, ■ ■ ■ ,XN\k, X) 

= Nk In A -I- A^ In fc 

N N ( 35 ) 

+ {k- l)^lna;i - 

i-l i=l 

The optimal parameters {k*,X*) are identified by solving 


{k*,X*) = argmax/wD(fc, A). (36) 

(fe,A) 


The first derivatives of fwD with respect to k and A are 
computed as 


dfwD 

dk 


and 


\ J 

N 

- A'=^(lnxi)xf 

Z=1 


In A 


(37) 


dfwD _ , (^ 

aA “ A 



(38) 


Setting Eq. ( |38l ) to be equal to 0 uniquely determines A* for 
k > 0. Solving for A yields 


(39) 

Substituting ( [39] l in Eq. ( [JT] ! and equating it with 0 characterize 
the critical points of fwD in terms of k for A = A*, i.e. 


N 

J 


N 

-I- In Xi - 

i=l 






= 0 . 


(40) 


Since the function fwD as defined in Eq. ( [35l l is concave with 
respect to fc > 0, a numerical algorithm such as the Newton- 
Raphson method can be used to obtain k*. Substituting this 
solution into Eq. ([39ll yields A*. 
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